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1. Introduction 


This final report presents all work that the Silvus-UCLA team have conducted on AFOSR STTR 
Contract FA9950-05-C-0103, TITLE: Throughput Optimization via Adaptive MIMO communications. 
This report includes and supersedes all previous quarterly progress reports and includes the work 
completed in the last performance period from April 1° 2006 to May 31" 2006. The work performed is 
divided into the following five categories, each of which will be reported in a separate section. 


e End-to-end matlab packet simulation platform. 
Φ Low density parity check code (LDPCC). 

e Field trials with Silvas DSP MIMO testbed. 

e High mobility extension. 

e Preparation for FPGA real-time implementation. 


To be self contained, the report starts with a brief problem statement and our approach. 


Identification and Significance of the Problem 


Since the early work of Foschini and Gans [6][7] and Teletar [11] on the capacity of multiple antenna 
radio (MAR), a great body of work has shown the potential for MIMO based communications to deliver 
unprecedented spectral efficiency in multi-path rich environments. To a large extent these studies have 
been theoretical and simulation based [12] [13]. A few experimental MIMO systems have also been 
reported in the literature, including some from the members of the Silvus team [14][8][9]. However, these 
trials have been mostly limited to controlled environments, mostly indoors, but a few outdoor mobile 
environments as well. 


The application of MIMO communications to the needs of UAV based communications is not fully 
understood. Indeed a UAV borne MIMO link poses several unique challenges both from an 
algorithmic/protocol point of view, as well as a hardware architecture point of view. For a UAV based 
system operating in dense urban environments, whether below or above the building clutter, the following 
unique challenges exist: 


e A highly dynamic channel due to the high mobility of the UAV, 
e Achannel whose capacity and degrees of freedom will vary significantly with UAV altitude, 
e A diversity of mission requirements including 
o Use in reconnaissance missions behind enemy lines (10s of Kbps) 
o Use asa stove-pipe relay node (a few Mbps) 
o Use as an element of a backbone node (100s of Mbps), 
e A diversity of the UAV platforms in use by the Air Force, 
Φ A potential deployment in a mobile ad-hoc network (MANET). 


The high mobility and changing characteristics of the UAV channel with altitude imply that the radio 
system must be adaptive in time and adaptive in space to exploit these characteristics. The diversity of 
mission requirements implies that the radio must adapt the bandwidth, the spectral efficiency, and the 
carrier frequency to be successful in all scenarios. In addition, the ability to be used in a variety of UAV 
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platforms and to easily integrate into a mobile ad-hoc network (MANET) environment mandates a radio 
system that is both hardware and protocol adaptive. 

The work will culminate in a complete system level design as well as a scalable hardware architecture 
that will allow the proposed MIMO system to scale with the capabilities of the UAV and the mission 
requirements. 


2. Approach 


This section outlines our general approach to addressing the needs of the UAV based MIMO 
communications system. 


High throughput communications will be addressed by realizing that MIMO techniques, although 
quite powerful, are only an element of an overall physical layer solution. MIMO must be designed to 
operate in-harmony and in complementary fashion with the other physical layer parameters. As such 
achieving the highest possible throughput in a given environmental conditions will be achieved by the 
proper choice of: 


e MIMO processing (spatial multiplexing, or space-time coding, or diversity processing, or 
smart antenna processing) 


e FEC code and rate 

e Constellation size and type of modulation 
e §6Signal bandwidth 

e Carrier frequency 


Our system implements all four variants of multi antenna techniques (spatial multiplexing, space-time 
coding, diversity processing, and smart antenna processing). In an attempt to get as close to the Shannon 
capacity as possible we incorporate advanced LDPC (low density parity check) codes. Realizing that the 
power of LDPC codes come at the price of decoder complexity, we also incorporate bit interleaved 
convolutional codes into our system requirements so as to enable us to operate with low decoder 
complexity over good channels. To combat multipath and to ensure high spectral utilization we have 
adopted an OFDM based signaling scheme. Over the past several years OFDM has become the 
modulation of choice for both wireline and wireless communication systems, as it has been adopted for 
DVB standards, wireless LAN, and other standards. 


Diversity of mission requirements dictate a high degree of adaptability in the physical layer and 
radio architecture itself. Moreover, the widely varying conditions of the wireless channel call for a great 
degree of configurability and robustness from the radio unit. Our approach to addressing the diversity of 
mission requirements is to incorporate a great deal of adaptability into the radio system. This includes 
adaptability in the modulation format, coding rate, bandwidth carrier frequency, and MIMO processing. 
Underlying the adaptability of the signal bandwidth, and center frequency is a highly agile radio 
architecture that we will be explored as part of the proposed work. 


High Doppler Communications refers to the case where the rate of change of the channel is high. In 
general the Doppler frequency, fz, is defined as v/A (A is the wavelength of the carrier and v is the relative . 
speed between the transmitter and receiver), represents the speed at which the channel changes. We 
overcome the limitations of extreme mobility through pilot symbol assisted modulation technique. 
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Radio form factor sufficiently small to fit into UAV payload. Although the issue of the final size of 
the radio is not directly addressed as part of the phase 1 effort, it is nonetheless an important issue that 
must be kept in mind in developing the system. As part of the phase one effort we searched for available 
processing and RF platforms with an eye towards being able to fit them into the belly of a small UAV. 
The processing capability of such platforms might dictate, in phase II, the specific MIMO configurations 
that might be supported. 


Over the course of the one year period of this project, the Silvus and UCLA team have successfully 
completed the following tasks: 


e Matlab MIMO-OFDM Simulation System. 

Φ Design and simulation of Low Density Parity Check Code. 
© Verification with Silvus DSP MIMO Testbed. 

e Design and Simulation for High Mobility Extension. 

e Preparation for Real-Time Implementation on FPGA. 


Each item will be reported in a separate section hereafter. 


3. Matlab MIMO-OFDM Simulation System 


A packet structure has been developed to support realistic end-to-end physical layer packet simulation. 
The developed packet structure is highly portable and support interoperability with both SISO and MIMO 
enabled nodes. Moreover given the size and power constraints of a UAV based communication system, 
there might be times when the desired QoS can be met in an energy efficient SISO mode. The packet 
structure does not preclude this. Additionally, we strive to develop the packet structure in such a way as to 
minimize overhead, effectively increasing goodput as compared to over the air data rate. Figure 1 shows 
the top-level packet structure. It consists of a universal SISO header that is understandable by all systems. 
The Mode identifier indicates the particular MIMO configuration that is used for the remainder of the 
packet, and the receiver dynamically switches to the proper decoder for the given mode. 


SISO Chan ΜΡ MIMO Chan 
ANT-1 | SISO AGC MIMO AGC MIMO DATA 


SISO Chan| Mode MIMO Chan 
ANT-n =| SISO AGC MIMO AGC MIMO DATA 


Figure 1. Physical Layer Frame Structure 


The packet mainly consists of six fields. The first three components only require one transmit and one 
receive antennas to work. 
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1. SISO AGC (Automatic Gain Control) Preamble. The SISO AGC field is intended to assist the 
receiver for packet detection, gain control, and optionally coarse frequency and symbol timing 
synchronization. 


2. SISO Channel Training. The SISO Channel Training field is intended to assist the receiver for 
symbol timing, carrier frequency synchronization, and channel estimation. 


3. Mode Identifier. The Mode Identifier tells the receiver the format of the packet following this 
field. Table 1 lists all the supported modes that can be selected in the Mode Identifier. The 
Mode Identifier is protected with CRC to prevent the receiver decoding based on wrong 
assumptions. 


Table 1. Supported Modes of The Packet Structure 


4. MIMO AGC Field. The MIMO AGC Field assists the receiver to adjust receiver gain settings 
for the MIMO section of the packet. This field is not transmitted for SISO mode. 


5. MIMO Channel Training Field. The MIMO Channel Training Field assists the receiver for 
MIMO Channel estimation and possibly improvement of carrier and timing synchronization. 
This field is not transmitted for SISO mode. 


6. Data Field. The Data field contains both payload and pilots. Two different pilot schemes are 
supported for best tradeoff between bandwidth efficiency and support for mobility. In the 
scenario of low mobility, low order constellation, or short packet length, a few sub carriers are 
dedicated to transmit pilots to assist the receiver for tracking phases and frequency offset. 
When the channel changes significantly during the packet, a more sophisticated pilot scheme 
is used. In this scheme, pilot symbols are spread across the frequency-time domain. The 
receiver could use these pilot symbols to estimate and track channel continuously. For detailed 
information on the pilot scheme for high mobility, refer to Section Error! Reference source 
not found.. 
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Using the packet structure described above, a complete simulation system has been efficiently 
implemented in Matlab. Certain functions are implemented in Mex functions using C/C++ to achieve 
good simulation speed. The implementation is highly modularized and parameterized for easy upgrade as 
the requirements evolve as well as for providing a simulation environment to test innovative ideas. The 
system is a complete packet level end-to-end simulation system, including packet generation, all major 
hardware/RF impairments, sophisticated MIMO channel model, and complete reference receiver. Figure 2 
and Figure 3 show the block diagram of the transmitter and the receiver. 


Inter- 
leaver 
Inter- 
leaver 


Figure 2. Functional Block Diagram of the Transmitter 
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Figure 3. Functional Block Diagram of the Receiver 
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The simulation system supports the following features: 


e Any combination of antenna configuration, up to 4 transmit antennas and 4 receive antennas. 


e Spatial multiplexing with up to 4 spatial streams. 


Final Report: Throughput Optimization via adaptive MIMO 


Contract No. FA9550-05-C-0103 


e Space-time block codes. 

© Spatial cyclic delay diversity. 

e Hybrid space-time and spatial multiplexing system. 

e Binary convolutional code and LDPCC with variable coding rate, 1/2, 2/3, 3/4 5/6. 

e Transmit per-subcarrier beamforming. 

© Space-frequency-time 3 dimensional interleaver for maximum coding diversity gain. 

e Different constellation size: BPSK, QPSK, 16QAM, 64QAM. 

© Soft/hard decision Viterbi decoder. 

e Layered fast LDPCC decoder. 

© Versatile channel model with programmable delay and power profile and Doppler spread. 


© Programmable hardware and RF impairments such as carrier frequency offset, phase noise, 
power amplifier non-linearity, and /Q imbalance. 


e Fixed point model for key components such as MIMO detection and synchronization. 


All these features can be easily controlled by toggling flags and/or setting parameters. For instance, most 
part of the receiver could be instructed to use perfect information to assess the implementation loss on an 
individual module basis. The simulation system also features a complete low-complexity high 
performance Silvus proprietary receiver that can serve as a design for real-time implementation. The 
simulation platform that the Silvus-UCLA team developed has become a valuable tool for trying out 
innovative ideas. The rest of the section presents some of the results obtained using this simulation 
platform. 


Simulation results 


The results presented here are the results of full system simulations in various channel scenarios. It is a 
complete physical layer end to end packet simulation which includes all the necessary transmitter and 
receiver algorithms. Major hardware impairments are also included. All algorithms are practical in terms 
of hardware implementation. In fact, if a real-time implementation is required, all those algorithms can be 
directly translated into fixed point implementation and mapped onto hardware. 


The following is a list of important parameters of the simulations that were reported here. 

4 transmit antennas with 4 spatial data streams 

4 receive antennas 

20MHz bandwidth 

2.4GHz carrier frequency 

Sub-carrier spacing 312.5KHz 

56 effective carriers, 52 data carriers + 4 pilot carriers 

Bit interleaved coded modulation with binary convolution code and QPSK, 16QAM, 64QAM 
Jakes model{21] is used for simulating time varying fading 

Frequency selective multi-path fading mode] is based on temporal multi-clustering model 
proposed in [22]. It has an exponential delay and power profile 
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The following hardware impairments were introduced in the simulation. 


Phase noise on both TX and RX sides. The phase noise model for the IEEE next generation 
wireless LAN standard is used. It represents a phase noise model one would observe in a typical 
inexpensive commercial grade system. Refer to [19] for the details of this model. 

Nonlinearity of power amplifiers. The model for the IEEE 802.11b is used. An output power 
backoff 10dB is used in the simulation. Refer to [20] for details. 

Carrier frequency offset. A frequency offset of 130 KHz is used for all simulations. 


The following list describes the important features of the receiver algorithm. 


A two-stage packet detection and OFDM symbol timing algorithms are used. First stage 
algorithm keeps searching for the SISO AGC preamble in the incoming signal. It has a very low 
complexity. Once it finds the preamble, a packet is declared to be found. The second stage uses 
the SISO Channel Training field to fine tune OFDM symbol timing. Carrier frequency estimation, 
noise variance estimation, and SISO channel estimation are done at the same stage. 

MIMO Channel estimation is done once using the MIMO Channel training field. There is no 
channel tracking afterwards. See Section Error! Reference source not found. for our simulation 
results with channel tracking. 

MIMO Detection is based on the principle of linear minimum mean square error (LMMSE) 
detection. 

A fast QAM soft demapper with complexity linear to the number of bits is developed and used in 
the simulation. 

Phase noise tracking is performed on a symbol by symbol basis using the embedded pilot carriers. 
A soft input sliding window Viterbi algorithm is used to decode convolution code. 


The simulations aim to examine the performance of the whole system from three different 
perspectives. 


Performance as a function of Doppler spread for QPSK, 16QAM, and 64QAM at a speed of 
SMPH, 1OOMPH, and SOOMPH. S5O0MPH is not simulated for 16QAM and 64QAM as the 
performance is not acceptable even with ideal synchronization and no RF impairments (See 
Figure 5 and Figure 7). A flat fading channel is chosen in this case. . 
Performance as a function of delay spread at a RMS delay spread of Ons, 15ns, 50ns, and 150ns. 
16QAM and 5MPH is used for this simulation. 

Performance as a function of packet Jength for QPSK. The speed is set to 1OOMPH 


The results are summarized in the following plots and paragraphs. As a comparison, part of the results 
for ideal synchronization and no RF impairments are also included. 
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Performance vs. Speed 


10° 


4 6 8 10 12 14 16 18 20 
SNR per RX, dB 


Figure 4. Packet Error Rate for QPSK at Different Speed 


4x4, 20MHz, 16QAM, R = 1/2, 200 Bytes (16 us) 


PER 


Figure 5. Performance 16QAM with Ideal Synchronization and No Hardware Impairments 
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4x4, 20MHz, 16QAM, R = 1/2, 200 Bytes (16 us) 
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Figure 6. Packet Error Rate for 16QAM at Different Speed 


4x4, 20MHz, 64QAM, R « 2/3, 400 Bytes (16 us) 
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Figure 7. Performance of 64QAM with Ideal Synchronization and No Hardware Impairments 
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Figure 8. Packet Error Rate for 64QAM at Different Speed 


Figure 4, Figure 6, and Figure 8 show the performance vs. speed simulation results. These results show that if 
the required packet error rate is 10% with 100-Byte long packets, QPSK is applicable at a speed more than 
500MPH, 16QAM is applicable at a speed more than 100MPH, and 64QAM requires 42dB SNR at a speed of 
100MPH. The RF impairments and practical synchronization algorithm introduces an approximate loss of 4dB, 548, 
and 12dB at 10% PER and 100MPH, for QPSK, 16QAM, 64QAM respectively. It is clear that if high constellation 
such as 64QAM is desired at high speed, the receiver needs to perform channel tracking to combat the time-varying 
fading. 


Performance vs. Delay Spread 


4x4, 20MHz, 16QAM, R = 1/2, 500 Bytes (40 us) 


SNR per RX dB 


Figure 9. Packet Error Rate for 16QAM with Different Delay Spread 
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Figure 9 shows the performance of the system using 16QAM with practical receiver and RF 
impairments. Interestingly, the diversity effect that 15 ns delay spread offers does not exhibit itself for 
low PER. However, a larger delay spread such as 5Ons or 150ns offers significant advantage over flat 
fading. For instance, 50 ns delay spread offers about 7dB gain at 10% PER compared to flat fading. The 
synchronization and RF impairments introduces about 4dB loss. 


Performance vs. Packet Length 


4x4, 20MHz, OPSK, ἢ = 1/2, 100MPH 
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Figure 10. Packet Error Rate of QPSK with Different Packet Size 


Figure 10 shows that if 10% is the target packet error rate, 800 bytes is the maximum packet size the simulated 
system supports. A significant error floor is observed when the packet length is above 400 bytes. 


Conclusion Drawn from the Simulation Results 
With QPSK, the current MIMO-OFDM system works reasonably well at a speed less than 1OOMPH. 
, For short packet communication, even higher constellation such as 16QAM and 64QAM or higher speed 
such as SOOMPH is possible. If more bandwidth efficiency and/or higher speed are desired, a channel 
tracking algorithm designed for time varying channel is required. This work will be reported in Section 
Error! Reference source not found.. 


4.Design and Simulation of Low Density Parity Check 
Code 


Low density parity check codes are linear binary block codes with parity check matrices containing 
mostly zeros and only small number of ones. LDPC codes can be described by an MxN parity check 
matrix, H, or via a graphical representation called bipartite graph. M rows of parity check matrix specify 
each of the M constraints on codeword bits and N columns define the codeword length. Similarly, 


Final Report: Throughput Optimization via adaptive MIMO 13 


Contract No. FA9550-05-C-0103 


Comment atn hrem me 


Sitvus 


bipartite graph contains N bit nodes, one for each bit (column of H) and M check nodes, one for each of 

- the parity checks (row of H). Figure 11 illustrates the parity check matrix, H, for a simple (7, 3) code and 
corresponding bipartite graph, which provides a graphical representation of the parity check matrix and 
assists in the understanding of the iterative soft decoding algorithm. In the Bipartite graph (Figure 11) of a 
(7, 3) code, bit nodes are denoted using circles and check nodes are denoted using squares. The check 
nodes are connected to bit nodes they check or in other words check node j is connected to a bit node i 
whenever element hj in H isa 1. 


vO vi v2 v3 v4 v5 v6 

1 11100 0] cO 
7 -|1 0001 1 0] cl 

11010 0 0] c2 

00001 1 14] «3 


Figure 11, Bipartite graph of a (7,3) code 


In the following, three variations of the LDPC decoding algorithm, namely sum-product, Offset Min- 
Sum, and Layered Decoding algorithms are discussed and corresponding simulation results are presented. 


Sum Product Algorithm 


The sum-product decoding algorithm (SPA) [23][24] works iteratively by passing messages on the 
edges of the associated bipartite graph. The messages are the Log-Likelihood Ratios (LLRs), where the 
sign of the message represent the binary digit and the magnitude denotes the reliability of the message. 

Before describing the sum-product algorithm, the notation is introduced. The set V(j)={i: Hj=1} 
defines the bit nodes that are connected to check j and the set of check nodes that are connected to bit i is 
denoted as //(i) ={j: H,j=1}. A set V(j) with bit i excluded is referred by V(j)\i and a set (ἢ) with 
check j excluded is denoted by “/(i)\ 7. Οἱ; defines a message sent from bit node i to check node j and 


R,, refer to the message that is passed from check node j to bit node i. The sum-product algorithm starts 
with an initialization step and then iterations continue by exchanging messages between bit and check 
nodes. Decoding is stopped when all the parities are satisfied or a maximum number of iterations are 
reached. The main steps of the decoding are summarized as follows. 


1* Step: Initialization: Each bit node is assigned a posteriori log-likelihood ratio, ie | [7 2 | 2 ; 
Ρι(0)} o 


where p,(1) and p.(1) represents probability of being 1 and 0 for bit i, y, is the received soft bit values, 
and O°” is the variance of the channel respectively. 

2 Step: Check Node Operation: The expression for check node to bit node messages, R ji» 3S 
calculated according to (1) 


ΚΑ, ] | a; 


ἕενν 


{XAau| a 
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a, =sign(Q,) B,=|Q,| (x) τοι = 


7-1 
3" Step: Bit Node Operation : Bit node to check node messages are estimated based on (2). 
= tn| BD if (2) 
Q, = “(aay | Κι, 


In this step, the soft decision values, 0, s, are also estimated from (3), which are then used in (4). 


pl) (3) 
= In| HE R, 
ols] 
4™ Step: Syndrome Check: The parity check operations, GH’, is performed based on the hard 
decision variables, ¢, , from equation (4). 
εἰ if <0 (4) 
" [0 otherwise 


If €H™ =0 or the number of iterations equals the maximum limit then the decoder is stopped else the 
decoder goes back to 2™ step and continues iterating. 


Offset Min-Sum Algorithm 


The SPA is the best performing, yet the most complex algorithm for the decoding of LDPC codes. In 
the last decade, various complexity reduction schemes for decoding of LDPC codes have been studied. 
One promising reduced complexity decoding scheme is the Min-Sum algorithm [25][26], which do not 
require any channel state information and involve only addition and compare operations. In order to 
improve the accuracy of the check node operation in the Min-Sum algorithm, the output reliability values 
can be reduced by a positive constant ἡ. This approach is called Offset Min-Sum (OMS) algorithm and 
can be simply implemented by replacing Equation (1) of SPA with (5). 


_=max{ minf}~n), οἱ 
Rj = [ [- ly, 


ieV,y (5) 


ly; 


min 


Layered Decoding Algorithm 


Another promising variation to SPA can be obtained by using a modified message processing 
schedule, called layered decoding [27]. In this approach, an LDPC code is viewed as a code concatenated 
from m constituent codes or layers. Consequently, single LDPC decoder iteration consists of m successive 
sub-iterations performed by each constituent code, and updated messages from previous constituent code 
are passed to the next constituent codes to be processed in the next sub-iteration. As opposed to the 
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standard SPA, this technique allows utilization of the updated messages more quickly in the algorithm 
and leads to faster convergence speeds in the LDPC decoder. Furthermore, processing messages in layers 
reduces the memory requirement per decoding iteration. 


Figure 12 shows a two layer parity check matrix for a (12, 4) LDPC code. Rows of the parity check 
matrix are grouped into non-overlapping subsets such that each column of this subset has at most a weight 
of one. 


10000001 0.00 
o1roo 1000 0010 


oo1oH1ToOn 00045 
μ-990 100101000 
001000001000 
000100000100 
100000000010 
010000000001 


Figure 12, Layered parity check matrix for a (12, 4) LDPC code 


The layered decoding algorithm can be independently applied to SPA or OMS. Layered sum-product 
algorithm (LSPA) is a simple variation of (1)-(3) and given in (6)-(8). In this case, first all Q, is 


2); 


initialized to —>Z . Then, for all i (bit node) in the layer k of the rows, (6)-(8) is repeated for one layer 
σ - 
after another. 
Q; = Q; - Κι (6) 
ἈΞ ΠΩΣ κα Φ 
fej py Fei py 
Ο, ΞΟ. ἘΚ, (8) 


In the case of layered offset min-sum (LOMS) algorithm, equation (7) is replaced by (5). 


Simulation Results 


In the following, BER comparisons of SPA, OMS, and LOMS Decoding algorithms are provided for a 
(1728, 864) LDPC code [27]. In the simulations, AWGN channel with zero mean and variance Nf is 


assumed and BPSK modulation is employed. Simulations are carried out until 100 word errors are 
collected for each SNR point. 


In Figure 14, floating point BER performance of the sum-product algorithm and OMS algorithm with 
various iterations are illustrated. As observed, increasing the number of maximum decoding iterations of 
SPA from 8 to 16 provides around 1 dB performance improvement while only 0.25 dB improvement is 
gained by doubling the decoding iterations from 16 to 32. In comparison of OMS to SPA, although 
reduced complexity OMS decoding algorithm introduces slight performance loss at low SNR values, it 
performs 0.1 db better compared to SPA at BER of 10°. 
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864x1728 (Rate 1/2) SPA Floating point simulations, AWGN, BPSK 
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Figure 13, BER performance of (1728,864) LDPC code using SPA and OMS algorithms 


Figure 14, compares BER performance of SPA with 16 iterations to Layered OMS algorithm with 8 
and 16 iterations. As seen, LOMS algorithm achieves approximately the same error rate performance of 
SPA with only half of the number of decoding iterations. 


864x1728 (Rate 1/2) Simulations, AWGN, BPSK 


Figure 14, BER and WER performance comparison 


A question that remains to be answered is: Is the benefit of LDPC worth the extra complexity as 
compared to convolution code? To answer this question, we need to design and architect a real-time 
implementation of LDPC code and compare the performance and complexity with the convolution 
code. We will leave this task in the Phase IJ of this effort. 
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5. Verification with Silvus DSP MIMO Testbed 


The designed MIMO-OFDM system has been tested in real channels on an existing Silvus 2x2 MIMO DSP 
testbed. Figure 15 shows a system block diagram of the testbed. The testbed operates in the 915MHz ISM band with 
26 MHz bandwidth. The analog part is implemented using COTS integrated radios. Communications algorithms 
could be implemented on the TI6416 DSP or on the host. The entire testbed is hosted in two compact PCI (ΡΟ) 
chassis. For the test that has been conducted for this project, the same Matlab simulation code is reused to generate 
and decode packets. The DSP is responsible for sending the generated packet to the DACs and acquiring data from 
the ADCs into the on board memory. The packet decoding is done in a non real-time fashion. However, the signal 
goes through actual channels and RF impairments. A graphical user interface (GUI) was developed to monitor the 
field test and collect results. Figure 16 shows a snapshot of the GUI during a test. The GUI provides instant 
information such as the packet error rate, channel singular values, SNR, etc., as soon as the received packet is 
decoded. It also enables users to observe the constellation, signal waveforms, and other intermediate receiver results. 
The designed MIMO-OFDM system has been demonstrated working on this testbed. Both throughput increase by 
using spatial multiplexing and diversity gain by using STBC were successfully demonstrated. For instance, it is 
clearly shown in the screenshot that by using STBC, much more reliable communications were achieved for 
16QAM. In fact, in that particular test, there was no error for STBC coded 16QAM and less errors for spatial 
multiplexed 16QAM. The SNR is around 18dB. 


Figure 15. System Block Diagram of Silvus DSP Testbed 
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Packet detected: 
Searching for packet ... 

No packets detected 
Acquiring data from DSP ... 
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Figure 16. A Screenshot of the Field Test GUI 


6. Design and Simulation for High Mobility Extension © 


To be able to track the highly dynamic changing channel environment like the one experienced by UAV 
is a complicated one. Due to frequency selectivity in frequency domain and high mobility of mobile 
reception in high Doppler communication environment, the channel suffers from both frequency and time 
dispersion. As a consequence of the rapidly time-varying channel, more pilot symbol (PS) are expected in 
time domain. It is necessary to sample the two-dimensional space (i.e. Frequency and Time) at greater 
than Nyquist rate of the channel process. To perform the channel estimation/tracking in a high mobile 
environment, a new data-field structure which is known as 2D checker board pattern pilot symbol assisted 
modulation (PSAM) is designed and shown in Figure 16. 
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MIMO-OFDM PSAM PacketStructure 
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Figure 16, Data-Field PSAM Packet Structure 


In practice, known symbols (Pilot Symbols, PS) are inserted at transmitter and channel estimate is 
acquired by interpolation. The channel estimator consists of linear combinations of the observations at the 
PS locations. The simplest and highest performing way to process and estimate channel information from 


MIMO environment is to use orthogonal modulation on each of the transmit antennas. Denote x, be 


observation of pilot value at the receiver and € be position of data symbols (DS). Due to orthogonality, 
channel estimates could be obtained by: 


é=E[ex,” |Cov(x,) x, = Wx, 


Where Cov(x ») is the covariance matrix of PS and W is the Wiener Filter coefficients. The current 


implementation of MIMO-OFDM PSAM Packet channel estimator in Silvus software simulator has a 
particular OFDM packet structure which contains 12 OFDM symbols, as shown in Figure 16. This 
particular PS placement in the frequency-time grid enables the system to track a frequency roll across the 
frame. In some situation, the optimum interpolation filter from this sampling of the noisy channel 
response is a linear filter whose tap coefficients are a function of particular channe! statistics. Therefore, 
the Wiener filter coefficients are often pre-computed and results in an open loop estimation structure 
which has no acquisition time. Performance of this particular MIMO-OFDM PSAM channel estimation 
scheme is simulated and the results are presented below. 
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4x4 BICM: 16QAM, 50nS RMS Delay Spread w/ PL=12, 384 Bytes 
PSAM Filter-Coeff Design at 20 dB 
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Figure 17, Packet Error Rate for 16QAM Rate-1/2 BICM at 500MPH, Channel D 
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Figure 18, Packet Error Rate for 16QAM Rate-1/2 BICM at SOOMPH, Channel A. 
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4x4 BICM: 64QAM, 50nS RMS Delay Spread w/ PL=12, 576 Bytes 
PSAM Filter-Coeff Design at 20 dB 
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Figure 19, Packet Error Rate for 64QAM Rate-1/2 BICM at 500MPH, Channel D 


In our simulations, we assume an equal number of transmit and receive antennas (i.e. 4 x 4 system with 4 
spatial streams). Most of OFDM-PHY parameters are compatible with the on-going IEEE 802.11n Next 
Generation WLAN proposal EWC PHY spec, v1.13. In particular, we assume the following: 


20MHz bandwidth 

2.4GHz carrier frequency 

Sub-carrier spacing 312.5KHz 

Bit interleaved coded modulation (BICM) with binary convolution code (BCC) and QPSK, 
16QAM and 64QAM 

Time Varying (TV) Jakes model is used for simulation both frequency-flat (FF) or frequency- 
selective (FS) fading 

Terminal mobility of SOOMPH (i.e. 0.72% Normalized Doppler) 

Pre-Computed Wiener Filter Coefficient designed at 20dB SNR and 0.72% Normalized Doppler 


We compute the packet error rate (PER). Each packet consists of 1536 symbols. Depending of 
modulation and coding schemes (MCS), flexible number of information bytes can be sent. We further 
assume perfect timing synchronization and no frequency offset. 


Figure 17 presents a PER performance comparison between perfect channel state information (PCS]) and 
PSAM of 16QAM Rate-1/2 BICM under Channel-D which is a Non-Light of Sight FS channel with 
ΒΟΟΜΡΗ͂. At 1% PER, we observe that PSAM is only ~1.5dB away from PCSI. With the proposed packet 
structure, we are able to re-construct the channel state information at receiver side fairly accurate even 
under a high terminal speed of SOOMPH. 
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Figure 18 presents a PER performance comparison between PCSI and PSAM of 16QAM Rate-1/2 BICM 
under Channel-A which is a FF channel also with SOOMPH. In this case, PSAM has performance very 
close to PCSI with degradation no more than 0.2dB. This is due to the fact that the channel is frequency 
flat and only varying in time because of high Doppler, so it transforms the original 2D frequency-time 
interpolation into a single dimension (i.e. time) interpolation. It is quite obvious that with much more 
resolution in time and PSAM performance gets better. 


Figure 19 presents a PER performance comparison between PCSI and PSAM of 64QAM Rate-I/2 BICM 
under Channel-D which is a Non-Light of Sight FS channel with SOOMPH. In this case, we observe that 
PSAM has an error-floor at about 10% PER. The reason for the error floor is two fold. In general, higher 
constellation requires not only higher SNR but also more sensitive to channel estimation errors. This is 
because a denser grid on the constellation plane reduces the pair-wise Euclidean distance. On the other 
hand, the current packet design assumes a pilot-tone placement of 4 x 4 space-time block of Wash- 
Hadamard orthogonal sequence. At SOOMPH terminal mobility which corresponds to normalized Doppler 
of 0.72%, the orthogonality assumption is broken and results in an error floor. To further improve the 
performance of 64QAM, one would need to use a different pilot symbol pattern. 


Our preliminary results show that reliable communication with 4x4 16QAM in high mobility could be 
achieved with time-frequency domain channel tracking. The problem is more challenging with 64QAM. 
We leave this problem in the Phase II effort when we would optimize the pilot symbol values and 
placement. 


7. Preparation for Real-time Implementation on FPGA 


The Silvus-UCLA also investigated on a possible real-time implementation. A real-time 
implementation offers the following benefits: 
e It is closest to an actually deployed system and predicts the achievable performance more 
accurately. 
e Jt enables more extensive field test. 
e Jt enables field test in a networked environment. 
e It enables field test of certain PHY algorithm such as feedback MIMO, which is otherwise 
impossible or not accurate. Feedback MIMO could significantly improve the system capacity 
and anti-jam performance. 


Two important tasks have been finished: identifying the development platform and mapping key 
floating point algorithms to fixed-point algorithms. 


e Silvus team conducted extensive search for available COTS FPGA development platform and 
have identified a set of FPGA boards from Nallatech (www.nallatech.com) that offers enough 
processing power for an advanced 4x4 MIMO communication system with 40MHz 
bandwidth. The identified FPGA development platform is in compact PCI form factor and 
supports a scalable architecture. Up to 4 high performances Xilinx FPGA and 8 DACs and 
ADCs can be supported on a single cPCI board. This platform provides enough processing 
performance as well as a simple analog baseband I/Q interface. A complete 4x4 MIMO 
communication system could be easily put together by using using a COTS RF product such 
as MAX2829. 


Φ Fixed point architecture of major receiver blocks has been defined. Fixed point algorithm has 
been designed and implemented for most receiver algorithms that involve floating point 
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arithmetic operations. The most critical part of the receiver is perhaps the MIMO detection. 
Here we present simulation results of our fixed point MMSE MIMO detection. 
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Figure 17. Performance of Fixed Point MMSE MIMO Detection 


Our fixed point MMSE MIMO detection is implemented based on QR decomposition followed by a set of 
matrix vector multiplications. Figure 17 shows the performance of the fixed point implementation in a 
4x4 64QAM system with rate 2/3 convolution code. This system achieves an information bit rate of 
192Mbps over a 20 MHz bandwidth and puts very strong requirement on the accuracy of the MIMO 
detection. The number of bits indicated in the plot is the number of bits used during the QR 
decomposition process. The matrix vector multiplication can be implemented at a less number of bits 
without noticeable performance loss. With 16 bit implementation, the performance is almost identical 
with floating implementation for PER above 1% and suffers only 0.5 dB loss at 0.1% PER. The 14 bit 
implementation lose more at high PER but is a good choice if PER requirement is low. 


8. Conclusion 


During the Phase I period of this contract, we have successfully designed a MIMO-OFDM packet 
structure that supports non-line-of-sight communication in an urban warfare. The packet structure 
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supports a variety of features that are configurable on a packet by packet basis. Those configurable 
features offer the possibility of optimum bandwidth efficiency, reliability, and complexity tradeoff in a 
diverse warfare environment. A complete end-to-end physical layer simulation platform has been 
constructed in Matlab. The feasibility of the developed packet structure and receiver algorithm has been 
verified on the Silvus DSP MIMO Testbed. We have also finished fixed point implementation for all the 
key modules of the receiver and identified a cPCI based FPGA platform for real-time implementation. 


The work that has been completed in this period laid a solid foundation for Phase II. In other words, we 


are fully prepared to implement on a FPGA platform a real-time MIMO packet communication system 
that meets the need of non-line-of-sight urban warfare communications. 
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